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Abstract 

This paper studies the noncausal relay channel, also known as the relay channel with unlimited lookahead, introduced by 
El Gamal, Hassanpour, and Mammen. Unlike the standard relay channel model, where the relay encodes its signal based on 
the previous received output symbols, the relay in the noncausal relay channel encodes its signal as a function of the entire 
received sequence. In the existing coding schemes, the relay uses this noncausal information solely to recover the transmitted 
message and then cooperates with the sender to communicate this message to the receiver. However, it is shown in this 
paper that by applying the Gelfand-Pinsker coding scheme, the relay can take further advantage of the noncausally available 
information, which can achieve strictly higher rates than existing coding schemes. This paper also provides a new upper bound 
on the capacity of the noncausal relay that strictly improves upon the cutset bound. These new lower and upper bounds on the 
capacity coincide for the class of degraded noncausal relay channels and establish the capacity for this class. 

I. Introduction 

The relay channel was first introduced by van der Meulen 0. In their classic paper JS], Cover and El Gamal established 
the cutset upper bound and the decode-forward, partial decode-forward, and compress-forward lower bounds for the relay 
channel. Furthermore, they established the capacity of the degraded and reversely degraded relay channels and relay channels 
with feedback. 

The relay channel with lookahead was introduced by El Gamal, Hassanpour, and Mammen 0, who mainly studied the 
following two classes: 

• Noncausal relay channel (also known as the relay channel with unlimited lookahead) in which the relay knows its 
entire received sequence in advance and hence the relaying functions can depend on the whole received block. Lower 
bounds on the capacity were established by extending (partial) decode-forward coding scheme to the noncausal case. 
The cutset upper bound for the noncausal relay channel was also established. 

• Causal relay channel (also known as the relay-without-delay channel) in which the relay has access only to the past 
and present received sequence. A lower bound for the capacity of this channel was established by combining partial 
decode-forward and instantaneous relaying coding schemes. The cutset upper bound for the causal relay channel was 
also established. 

The focus of this paper is on the noncausal relay channel. The existing lower bounds on the capacity of this channel 
are derived using the (partial) decode-forward coding scheme. In particular, the relay recovers (the part of) the nansmitted 
message from the received sequence (available noncausally at the relay) and then cooperates with the sender to coherently 
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transmit this message to the receiver. Therefore, the noncausally available information is used solely to recover (the part of) 
the transmitted message at the relay. However, the channel conditional pmf can allow the relay to take further advantage 
of the received sequence by considering it as noncausal side information to help the relay's communication to the receiver. 
In this paper, we establish several improved lower bounds on the capacity of the noncausal relay channel based on this 
observation by combining the Gelfand-Pinsker coding scheme with (partial) decode-forward and compress-forward at the 
relay. Moreover, we establish a new upper bound on the capacity that improves upon the cutset bound [1 Theorem 17.6]. 
The new upper bound is shown to be optimal for the class of degraded noncausal relay channels and is achieved by the 
Gelfand-Pinsker decode-forward coding scheme. 

The remainder of this paper is organized as follows. In Section [Tl] we formulate the problem and provide a brief overview 
of the existing results. In Section [TTIJ we establish three improved lower bounds, the Gelfand-Pinsker decode-forward (GP- 
DF) lower bound, the Gelfand-Pinsker compress-forward lower bound, and the Gelfand-Pinsker partial decode-forward 
compress-forward lower bound. We show through Example [T]that the GP-DF lower bound can be strictly tighter than the 
existing lower bound. In Section |IV] we establish an improved upper bound on the capacity, which is shown to strictly 
improve upon the cutset bound in Example [5] The improved upper bound together with the GP-DF lower bound establish 
the capacity of the degraded noncausal relay channels. 

Throughout the paper, we follow the notation in [1|. In particular, a random variable is denoted by an upper case letter 
(e.g., X, Y, Z) and its realization is denoted by a lower case letter (e.g., x, y, z). By convention, X = means that X is a 
degenerate random variable (unspecified constant) regardless of its support. Let X^ = (Xki,Xk2, ■ ■ ■ ,Xk n )- We say that 
X — >• Y — >• Z form a Markov chain if p(x, y, z) = p(x)p(y\x)p(z\y). For a > 0, [1 : 2 a ] = {1,2,..., 2^1 }, where \a] is 
the smallest integer greater than or equal to a. For any set S, \S\ denotes its cardinality. The probability of an event A is 
denoted by P{A). 

II. Problem Formulation and Known Results 

A. Noncausal Relay Channels 

Consider the 3-node point-to-point communication system with a relay depicted in Figure Q] The sender (node 1) wishes 
to communicate a message M to the receiver (node 3) with the help of the relay (node 2). The discrete memoryless (DM) 
relay channel with lookahead is described as 

{Xi,X 2 ,p(y2\x 1 )p{yz\xi, 3:2,2/2), y%, ^3,0 (1) 

where the parameter I G Z specifies the amount of lookahead. The channel is memoryless in the sense that p(y2i\x\, y 2 ~ , m) = 
PY 2 \x 1 (V2i\xu) and p(y S i\x\, x\, y\, y£~ l , m) = PY 3 \Xi,x 2 ,Y 2 (y3i\xu,x 2 i,y2i)- 
A (2 nR ,n) code for the relay channel with lookahead consists of 

• a message set [1 : 2 nR ], 

• an encoder that assigns a codeword x™(m) to each message m G [1 : 2 nR ], 

• a relay encoder that assigns a symbol X2i{y l ^ rl ) to each sequence y 1 ^ 1 for i G [1 : n], where the symbols that have 
nonpositive time indices or time indices greater than n are arbitrary, and 

• a decoder that assigns an estimate m(y^) or an error message e to each received sequence j/3 1 . 

We assume that the message M is uniformly distributed over [1 : 2 nR }. The average probability of error is defined as 
pj™^ = P{M 7^ M}. A rate R is said to be achievable for the DM relay channel with lookahead if there exists a sequence 
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Fig. 1. Relay channel with lookahead 1 6 Z 



of (2 nR ,n) codes such that lim„_ i . 00 P e = 0. The capacity C; of the DM relay channel with lookahead is the supremum 
of all achievable rates. 

The standard DM relay channef] corresponds to lookahead parameter I = — 1, or equivalently, a delay of 1. The noncausal 
relay channel which we focus on in this paper is the case where I is unbounded, i.e., the relaying functions can depend on 
the entire received sequence y% . The purpose of studying this extreme case is to quantify the limit on the potential gain 
from relaying. 

B. Prior Work on the Noncausal Relay Channel 

The noncausal relay channel was initially studied by El Gamal, Hassanpour, and Mammen |3), who established the 
following lower bounds and cutset upper bound on the capacity C^. 

• Decode-forward (DF) lower bound: 

Coo>i?DF= max min{7(X i; y 2 ),7(X l! X 2 ;y 3 )}, (2) 

p(x 1 ,x 2 ) 

• Partial decode-forward (PDF) lower bound: 

Coo>i?PDF= max mhL{I(X 1 ,X 2 ;Y 3 ),I(V;Y 2 )+I(X 1 ,Y 3 \X 2 ,V)}, (3) 

p(v,Xi,X2) 

• Cutset bouncj^ for noncausal relay channel: 

Oo<i?cs= m ax mia{I(X 1 ,X 2 ;Y 3 ),I{X 1 ;Y 2 )+I(X 1 ;Y 3 \X 2 ,Y 2 )}. (4) 

p{x 1 )p(X2\x 1 ,y 2 ) 

III. Lower Bounds 

In this section, we establish three lower bounds by considering the received vfe sequence at the relay as noncausal side 
information to help communication. In Subsection IIII-AI we first establish the Gelfand-Pinsker decode-forward (GP-DF) 
lower bound by incorporating Gelfand-Pinsker coding with the decode-forward coding scheme. Then, we show the GP- 
DF lower bound can be strictly tighter than the decode-forward lower bound and achieve the capacity in Example [TJ 
In Subsection IIII-BI we establish the Gelfand-Pinsker compress-forward (GP-CF) lower bound via two different coding 
schemes. In Subsection IIII-C1 we further combine the the coding scheme for the GP-CF lower bound with partial decode- 
forward coding scheme. 

'Note that here we define the relay channel with lookahead as p(y2 \^i)p(ys\xi, X2, y2), since the conditional pmf p(fj2, V3 \xi , £2) depends on the 
code due to the instantaneous or lookahead dependency of X2 on Y"2- 

2 There is a small typo in (3] Theorem 1] where the maximum is over p(xi,X2) instead of p{x{)p{x2\x\ , 1/2)- 
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A. Gelfand-Pinsker Decode-Forward Lower Bound 

We first incorporate Gelfand-Pinsker coding with the decode-forward coding scheme. 

Theorem 1 (Gelfand-Pinsker decode-forward (GP-DF) lower bound). The capacity of the noncausal relay channel is lower 
bounded as 

Coo > i?cp-DF =maxmin{7(X 1 ;Y 2 ) ! I(X 1} U;Y 3 ) - I(U; y 2 |Xi)}, (5) 
where the maximum is over all pmfs p(xi)p(u\xi, y 2 ) and functions x 2 (u, Xi, y-x). 

Proof: The GP-DF coding scheme uses multicoding and joint typicality encoding and decoding. For each message 
to, we generate a x™(to) sequence and a subcodebook C(m) of 2 nR u n (l\m) sequences. To send message to, the sender 
transmits x"(to). Upon receiving y r 2 L noncausally, the relay first finds a message estimate m. It then finds a u n (l\rh) G C(m) 
that is jointly typical with (x™ (to), 2/2) ar, d transmits x 2 (u™(Z|m), x"(m), )• The receiver declares to to be the message 
estimate if (x™(to), u n {l\m), 2/3 ) are jointly typical for some u™(Z|to) G C(rh). We now provide the details of the proof. 

Codebook generation: Fix p(xi)p(it|xi, j/ 2 ) and x 2 (u, xi,y 2 ) that attain the lower bound. Randomly and independently 
generate 2 nR sequences x™(m), each according to Yii=i Vx x {xh), to G [1 : 2 nfl ]. For each message m G [1 : 2 ni? ], 
randomly and conditionally independently generate 2 ni? sequences u™(Z|m), each according to Yli=iPu\x 1 ( u i\xii(m)), 
which form the subcodebook C (to). This defines the codebook C = {(x™(m), u n (l\m), x 2 (u n (l\m), x™(m), y 2 )) : to G [1 : 
2 nfi: ], Z G [1 : 2 njR ]}. The codebook is revealed to all parties. 

Encoding: To send message to, the encoder transmits x™(m). 

Relay encoding: Upon receiving j/?? noncausally, the relay first finds the unique message to such that (x™(m),j/ 2 ) 6 
7j . Then, it finds a sequence u"(Z|m) G C(to) such that (u"(Z|to), x™(m), y 2 ) G 71/™'*. The relay transmits x-n = 
x 2 (ui(l\rh),xi l (m),y 2l ) at time i G [1 : n]. 

Decoding: Let e > e'. Upon receiving j/3, the decoder declares that to G [1 : 2 nR ] is sent if it is the unique message such 
that (x^(m),u n {l\m),y%) G Te (n) for some u n (l\m) G C(m); otherwise, it declares an error. 

Analysis of the probability of error: We analyze the probability of decoding error averaged over codes. Assume without loss 
of generality that M = 1. Let M be the relay's message estimate and let L denote the index of the chosen U n codeword 
for M and Y 2 . The decoder makes an error only if one of the following events occur: 

£ = {M 1}, 
£ 1 = {(A7(l),y 2 ")G^ n) }, 
£2 = {(X?(m), Y 2 n ) G T}, n) for some to ^ 1}, 
£3 = {(C/"(Z|M),X"(Af),r 2 n ) £ 7^ n) for all U n (l\M) G C(M)}, 

£ 1 = {(xr(i),;7 n (L|i),y 3 ")^r £ ^}, 

£2 = (m), Z7 n (Z|m),F 3 n ) G T e {n) for some to ^ 1, Z7 n (Z|m) G C(m)}. 
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Thus, the probability of error is upper bounded as 

P{£) = P{M ^ 1} 

< P(E) + P(S 3 n E c ) + P{S X n s c n ff ) + P(£ 2 ) 

< P(£i) + P(£ 2 ) + P(£s n £ c ) + P(fi n £ c n £f ) + P(£ 2 ). 

By the law of large numbers (LLN), the first term tends to zero as n — >• 00. By the packing lemma Q] p. 3.18], the second term 
tends to zero as n — > 00 if R < I(X 1 ;Y 2 ) — S(e'). Therefore, P(£) tends to zero as n — > 00 if R < I(X 1 ; Y 2 ) — 5{e'). Given 
£ c , i.e. {M = 1}, by the covering lemma JT] p. 3.51], the third term tends to zero as n — > 00 if R > I(U; Y^l-^i) + S( e ')- 
By the conditional typicality lemma, the fourth term tends to zero as n — > 00. Finally, note that once m is wrong, U n (l\m) 
is also wrong. By the packing lemma, the last term tends to zero as >oo if R + R < I(Xi, U ; I3) — 5(e). Combining 
the bounds and eliminating R, we have shown that P{M ^ 1} tends to zero as 11 — > 00 if R < I(Xi; Y2) — S(e') and 
R < I{X U U; Y 3 ) - I(U: Y 2 \Xi) - S'(e) where 8'{e) = 6(e) + S(e'). This completes the proof. ■ 

Remark 1. Unlike the coding schemes for the regular relay channel, we do not need block Markov coding for the noncausal 
relay channel for two reasons. First, from the channel statistics p(y2\x\), Y 2 does not depend on X 2 and hence there is 
no need to make x™ correlated with the previous block x 2 l ■ Second, y 2 is available noncausally at the relay and hence the 
signals from the sender and the relay arrive at the receiver in the same block. 

Remark 2. Taking U conditionally independent of Y 2 given X\ and setting X 2 = U, the GP-DF lower bound reduces to 
the DF lower bound in (O. 

The GP-DF lower bound can be strictly tighter than the DF lower bound as shown in the following example. 

Example 1. Consider a degraded noncausal relay channel p(y 2 \xi)p(y3\xi,x 2 ,y 2 ) = p(y2\xi)p(y3\x 2 ,y 2 ) depicted in 
Figure [2] The channel from the sender to the relay is a BEC(l/2) channel, while the channel from the relay to the receiver 
is clean if Y 2 6 {0, 1} and stuck at 1 if Y 2 is an erasure. 



Xi Y 2 Y 2 = 0, 1 y 2 = e 

n . n *2 Y 3 X 2 Y 3 




Fig. 2. Channel statistics of the degraded noncausal relay channel 

Note that the state of the channel from the relay to the receiver, namely, whether we get an erasure or not, is independent 
of X%, The first term in both the DF lower bound and the GP-DF lower bound is easy to compute as 

maxI(Xr,Y 2 ) = 1/2. 

J>Oi) 

Consider the second term in the DF lower bound. Here X 2 is chosen such that Y 2 — > X\ — > X 2 form a Markov chain. By 
carefully computing the conditional probability p(y3\xi, x 2 ) = ^ V2 p{y2\xi)p(jjz\x 2 ,y 2 ) in this specific channel, we can 
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show that Xi — > X 2 — > Y 3 form a Markov chain. Thus, 

max I(X 1 ,X 2 ;Y 3 )® max I(X 2 ;Y 3 ) 

p(xi,x 2 ) p{x 2 \x 1 ) 

-maxJ(I 2 ;r 3 ) 

= ff(l/5)-2/5 
= 0.3219, 

where (a) follows since X\ — > X2 — > Y 3 form a Markov chain, (b) follows since I(X 2 ;Y 3 ) is fully determined by the 
marginal distribution p(x2,y 3 ), and (c) follows since the channel from X2 to Y 3 p{y 3 \x2) = J2 V2 P(Vs\ x 2, 2/2)^(2/2) is a Z 
channel with crossover probability 1/2 regardless of p(xi). Thus, 

.Rdf = min{l/2, 0.3219} = 0.3219. 

Now consider the second term in the GP-DF lower bound (0 

max \I{X U U;Y S ) - I{U;Y 2 \Xr)\. 

p(x 1 )p(u 2 \x 1 ,y 2 ) 
x 2 (u,xi : x 2 ,y 2 ) 

Let U = X 2 = 1, if Y 2 = c, and U = X 2 = Bern(l/2), if Y 2 = 0, 1. Note that here we always have Y 3 = X 2 = U and 
X\ — > Y 2 — > X2 form a Markov chain. Thus, 

I(X 1 ,U;Y 3 )-I(U;Y 2 \X 1 ) = I(X 1 ,X2;X2)-I(X 2 ;Y2\X 1 ) 

= H(X 2 ) - H(X 2 \X 1 ) + H{X 2 \Y 2 ,X 1 ) 
= I(X 1 ;X 2 )+H(X 2 \Y 2 ) 
>H{X 2 \Y 2 ) 
= 1/2. 

Therefore, 

iicp-DF = 1/2 > Rdf = 0.3219. 
Moreover, it is easy to see from the cutset bound (@) that the rate 1/2 is also an upper bound, and hence = 1/2. 

B. Gelfand-Pinsker Compress-Forward Lower Bound 

In this subsection, we first propose a two-stage coding scheme that incorporate Gelfand-Pinsker coding with the compress- 
forward coding scheme, then show an equivalent lower bound can be established directly by applying the recently developed 
hybrid coding scheme at the relay node. 

Theorem 2 (Gelfand-Pinsker compress-forward (GP-CF) lower bound). The capacity of the noncausal relay channel is 
lower bounded as 

Coo >i? G p-CF = maxmin{/(X i; [/,y 3 ), I(X U U; Y 3 ) - I(U; Y 2 |Xi)}, (6) 
where the maximum is over all pmfs p(x\)p{u\y 2 ) and functions x 2 (u,y 2 ). 



Outline of the proof: The coding scheme is illustrated in Figure [5] We use Wyner-Ziv binning, multicoding, and joint 
typicality encoding and decoding. A description y 2 of y 2 is constructed at the relay. Since the receiver has side information 
y 3 about y 2 , we use binning as in Wyner-Ziv coding to reduce the rate necessary to send y 2 . Since the relay has side 
information y 2 of the channel p(y 3 \xi, X2, 2/2 )> we use multicoding as in Gelfand-Pinsker coding to send the bin index of 
y 2 via it™. The decoder first decode the bin index from u n . It then uses u n and y 3 to decode y 2 and a;"(m) simultaneously. 



U"(l\lm) 




Fig. 3. GP-CF coding scheme with binning and multicoding 



We now provide the details of the coding scheme. 

Codebook generation: Fix p(xi)p(u\y 2 )p(y2\y2)x2(u, 2/2, 2/2) that attains the lower bound. Randomly and independently 
generate 2 nR sequences x"(m), to G [1 : 2 nR ], each according to n™=i Px 1 (x\i). Randomly and independently generate 
2 nR 2 sequences y%(k), k G [1 : 2 nR2 ], each according to f]™ =1 Py 2 (# 2 i). Partition fc into 2 nfl2 bins S(Z m ). For each i m , 
randomly and independently generate 2 ni?2 sequences u n (Z|Z m ), i G [1 : 2 nR2 ], each according to Y["=i Pu(ui), which form 
subcodebook C(l m ). This defines the codebook C = {(x™ (m) , y™ (fc) , u n (l\l m ) , x^ (u n , y^ , y%)) : m G [1 : 2 nR },k G [1 : 
2 nR2 ],l m G [1 : 2 nR2 ],l G [1 : 2 Tli?2 ]}. The codebook is revealed to all parties. 

Encoding: To send the message m, the encoder transmits £™(m). 

Relay encoding and analysis of the probability of error: Upon receiving y 2 , the relay first finds the unique k such that 
(y 2 (k),y 2 ) 6 v • This rec l u i res R2 > -^(^25^2) + S(e') by the covering lemma. Upon getting the bin index l m of k, i.e., 
fc G B(l m ), the relay finds a sequence u n (l\l m ) G C(Z m ) such that (u n (Z|Z m ), 2/2 ) £ 7"j n) . This requires # 2 > I(U; Y 2 )+5(e') 
by the covering lemma. The relay transmits x 2 (y2i(k), Ui(l\l m ), y2i) at time i G [1 : n]. 

Decoding and analysis of the probability of error: Let e > e'. Upon receiving 2/3 , the decoder finds the unique l m such 
that (u n (l\l m ),y 3 ) G 7; (n) for some u n (l\l m ) G C(Z m ). This requires R 2 + R 2 < I(U;Y 3 ) - 6(e). The decoder then 
finds the unique message m such that (a;™ (to), 2/2 (k), 2/3 ) G 7e for some fc G B(l m ). Let if be the chosen index for 
Y 2 at the relay. If to ^ 1 but k — K, this requires i? < /(Xi; Y^Y^) — 6(e), If to 7^ 1 and k ^ K, this requires 
R . + R 2 - i? 2 < I(Xt;Y 3 ) + I(Y 2 ;X 1 ,Y 3 ) - 6(e). Thus, we establish the following lower bound: 

Coo > i?cp-CF = maxmin{/(Xi; Y 2 , Y 3 ), I(X U Y 2 ;Y 3 ) - I(Y 2 ; Y 2 \X t ) + I(U] Y 3 ) - I(U; Y 2 )}, (7) 
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where the maximum is over all pmfs p(xi)p(u\y 2 )p(y 2 \y 2 ) and functions x 2 (u, y 2 , 3/2)- 

Now we show the two lower bounds (0 and ^ are equivalent. Setting U = in -Rg P CF and relabeling Y2 as U, R' c 



L GP-CF 



reduces to -Rgpcf- Thus, 



-Rgp-cf — ^gp-cf- (8) 
On the other hand, letting U = (U, Y 2 ) in i?cp-CF, we have 
I{Xx,U,Y 2 ;Y z )-I(U,%;Y 3 \X-i) 

= I(X 1 , Y 2 ;Y 3 ) + I(U; Y 3 \X U Y 2 ) - I{% ; Y 2 - I(U; Y 2 \X 1 ,%) 
= I(X u Y r ,Y 3 )-I(Y 2 ;Y 2 \X 1 ) + H(U\X 1 ,Y 2 )-H(U\X 1 ,Y 2 ,Y 3 )-H 

% I(X 1 ,Y 2 ;Y 3 ) - I(Y 2 ;Y 2 \X 1 ) + H(U) - H(U\Y 3 ) - H{U) + H(U\X 1 ,Y 2 ,Y 2 ) 
( = 5 1(X U Y 2 ;Y 3 ) - I(Y 2 ; Y 2 \X^ + I(U; Y 3 ) - I(U; Y 2 ), 

where (a) follows since conditioning reduces entropy and (b) follows since (Xi,Y 2 ) — > Y 2 — >• U form a Markov chain. 
Furthermore, since the maximum in i?GP-CF is over a larger set p(u,y 2 \y 2 ) than the set p(u\y 2 )p(y 2 \y 2 ) in i?c P _ CF , 



-Rgp-cf > -Rgp-cf- 



(9) 



Combining ([8]) and (O establishes the equivalence. 



Remark 3. Taking U independent of Y 2 and X 2 = U in (O, we establish the compress-forward lower bound without 
Gelfand-Pinsker coding as follows: 

Coo > Rcf = maxmin{/(X 1 ;y 2 ,Y3) ! 7(X 1 ,Y 2 ;y 3 ) + I(X 2 ;Y 3 ) - 7(y 2 ; F 2 |^i)}, 
where the maximum is over all pmfs p(xi)p(a;2)p(y2|?/2)- 

In the analysis of the probability of error in Theorem there is a technical subtlety in applying the standard packing 
lemma and joint typicality lemma, since the bin index L m , the compression index K, and the multicoding index L all 
depend on the random codebook itself. In the following, we show the GP-CF lower bound <j6j can be established directly 
by applying the recently developed hybrid coding scheme for joint source-channel coding by Lim, Minero, and Kim Q, 
E), 0. 

Proof of Theorem |2] via hybrid coding: In this coding scheme, we apply hybrid coding at the relay node as depicted 



YI 



Vector 
quantizer 



U n (L) 



x 2 {u,y 2 ) 



X% 



Fig. 4. Hybrid coding interface at the relay. Illustration from Kim, Lim, and Minero (9) 



in Figure |4] The sequence y 2 is mapped to one of 2 nR sequences u n (l). The relay generates the codeword x r 2 L through a 
symbol-by-symbol mapping x 2 (u, y 2 ). The receiver declares m to be the message estimate if (x™(m), u n (l),y 3 ) are jointly 
typical for some I G [1 : 2 nR \. Similar to the hybrid coding scheme for joint source-channel coding J6], J9), the precise 
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analysis of the probability of decoding error involves a technical subtlety. In particular, since U n (L) is used as a source 
codeword, the index L depends on the entire codebook. This dependency issue is resolved by the technique developed in 
J5]. We now provide the details of the coding scheme. 

Codebook generation: Fix p(xi)p(u\y2) and xz(u, 2/2) that attain the lower bound. Randomly and independently generate 2 nR 
sequences .T™(m), m £ [1 : 2 nR ], each according to Yli=iPXi( x ii)- Randomly and independently generate 2 nR sequences 
u n (l), I £ [1 : 2 nR ], each according to Y[i=iPu( u i)- This defines the codebook C = {(x"(m), u n (l), x 2 (u n (l), y 2 )) : m £ 
[1 : 2 nR ], I £ [1 : 2 nR }}. The codebook is revealed to all parties. 

Encoding: To send message m, the encoder transmits x™(m). 

Relay encoding: Upon receiving y%, the relay finds an index I such that (u n (l),y 2 ) £ "v ■ ^ there is more than one such 
indices, it chooses one of them at random. If there is no such index, it chooses an arbitrary index at random from [1 : 2 nR ] . 
The relay then transmits X2i(ui(l),y2i) at time i £ [1 : n]. 

Decoding: Let e > e'. Upon receiving y 3 , the decoder finds the unique message rh such that (x™(m), u n (l), y 3 ) £ 7« for 
some Z £ [1 : 2 nR \. 

Analysis of the probability of error: We analyze the probability of decoding error averaged over codes. Let L denote the 
index of the chosen U" codeword for Y 2 . Assume without loss of generality that M = 1. The decoder makes an error only 
if one of the following events occur: 

£ = {(U n (l),Y 2 n ) £7"j n) for all/}, 

£ 1 = {(Xni),U n (L),Y 3 n )tT^}, 

£ 2 = {(X?(m), U n (L),Y 3 n ) £ T} n) for m ^ 1}, 

£ 3 = {(X?(m),U n (l),Y 3 n ) £ T e (n) form ? 1,1 ? L}. 

By the union of the events bound, the probability of error is upper bounded as 

P(£) = P(£ U £1 U £ 2 U £3) 

< P(£) + P(£i n £ c ) + P(£ 2 n £ c ) + P(£ 3 ). 

By the covering lemma, the first term tends to zero as n —> 00 if R > I(U;Y2) + S(e'). By the conditional typicality 
lemma, the second term tends to zero as n — > 00. By the packing lemma, the third term tends to zero as n —> 00 if 
R < I{X i; U,Y 3 ) - 5(e). 

The forth term requires special attention. Consider 

P(£ 3 ) = P{{X?{m),U n {l),Y 3 n ) £ % {n) form ± 1,1 ± L} 

m=2 1=1 
2 nR 2 nR 

= E E E P{(Xnm),U n (l), Y?) £ T^\l L\Y 2 n = y n 2 }p{y^) 

m—2 k—1 1/2 

< 2 nR 2 nR P{(Xm, U n (l), Y 3 n ) £ 7< n \L ^l\Y 2 n = i/JMtfJ), 
2/2 
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where (a) follows by the union of events bound and (b) follows by the symmetry of the codebook generation and relay 
encoding. Let C = C \ {(X[ l (2), U n (l), X%(X?(2), U n (l)))}. Then, for n sufficiently large, 

P{ (X? (2) , U n (1) , Y?) G T t (n) , L + 1 1 Y 2 n = y% } 

< P{(X 1 "(2) ) f/«(l),y 3 ") G T^\L + l,Y 2 n = y 2 "} 

E PW(2) = xi, U n (l) = u n , Y 3 n =y%\L^ 1,Y? = y%} 

E E P W( 2 ) = c/ n (i) = « n ,i? = 1^ ^ W = itf ,c = c} p{c = c\l ? i, y 2 " = 

- E E ^^(i) = u " I L ± !> *a* = 2/2 > C = C} P{X?(2) = x? |£ jt 1, Y 2 n = yj, C = C, r 3 " = Vz) 

0E?,i l ™,yJ)eT e ( " ) c 

• P{^ 3 " = |i ^ i, IT = v?,c = c} P{c = c|l ^ i, y 2 " = 

< E E 2 = p {^r(2) = x?} P{F 3 " = j/JIL ^ 1,1? = = C} P{C = c\L 1, 1? = 2/ 2 1 } 
E 2P{CP(l)=tt»}P{JfI l (2)= a: J}P{37 = yJ|L^l,l? = V J} 



< E 2P{c/"(i) = u n }P{xr(2) = xi i }.2P{y 3 " = ^|y 2 " = 2/2 i }, do) 

(a;™,u™,y£)er £ < " ) 

where (a) follows since given L ^ 1, C/™(1) — > (y 2 ",C) — > (I3", X™(2)) form a Markov chain, (6) follows since for 
n sufficiently large P{U n {l) = u n \K ^ l,y 2 " = 2/2 = C> < 2P{[/"(1) = u n } and Zf(2) is independent of 
(y 2 ",y 3 n ,C,X), and (c) follows since for n sufficiently large P{y 3 " = y%\L ^ l,y 2 n = y%] < 2 P{y 3 " = y§\Y£ = y^}. 
The statements in (b) and (c) are established by Lim, Minero, and Kim in (5] Lemmas 1,2]. Back to the upper bound on 
P(£ 3 ), by the joint typicality lemma and ( fTOl ), we have 

P(£ 3 ) = P{(X?(m), U n (k),Y s n ) G 7; (n) for m^l,l^L) 

<4.rt"^pfe n ) E P{^r(2) = ^}P{C/™(l) = U ™}p( 2 ; 3 l |y 2 ") 

y'l ( a: ' 1 ',«", y3 ™)eri" ) 



4 . 2 «j?, 2 nJ? J2 P{^™(2) = <} P{[/™(1) = u n }p(y%) 

n(R+R-I{XiiY 3 )-I(U;X u Y s )+(e)) 



< 4- 2 



which tends to zero as n — > 00 if R + R < I(Xi; y 3 ) + I(U; Xi, Y3) — (e). Eliminating R and letting n — » 00 completes 
the proof. ■ 



C. Gelfand-Pinsker Partial Decode-Forward Compress-Forward Lower Bound 

Finally, we further combine the hybrid coding scheme developed for the GP-CF lower bound with the partial decode- 
forward coding scheme by El Gamal, Hassanpour, and Mammen 0. 
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Theorem 3 (GP-PDF-CF lower bound). The capacity of noncausal relay channel is lower bounded as 
Coo > maxmm{I(V, U;Y 3 ) + I{X 1 ; U, Y 3 \V) - I{U; Y 2 \V), 
I(V;Y 2 ) + IiX^U^V), 

I(V;Y 2 ) + I(X V ,U,Y 3 \V) + I(U;Y 3 \V) - I(U;Y 2 \V)}, (11) 
where the maximum is over all pmfs p(v, xi)p(u\v,y 2 ) and functions x 2 (u,v,y 2 ). 

Proof: In this coding scheme, message m G [1 : 2 nR ] is divided into two independent parts m! and m" where 
m! G [1 : 2 nR ], m" G [1 : 2 nR ], and R! + R" = R. For each message m = (m', m"), we generate a x"(m"|m') sequence 
and a subcodebook C(m') of 2 nR u n (l\m!) sequences. To send message m = (m',m"), the sender transmits a:™(m"|m'). 
Upon receiving y 2 noncausally, the relay decodes the message m', finds a u n (l\m!) G C(m!) that is jointly typical with 
(v n (m'), 2/2 ), and transmits x 2 (u n (l\rh'), v n (m'), y% ). The receiver declares rh = (fh',m") to be the message estimate if 
(x™(m"|m'), u n (l\m'), v n (m'), j/3 ) are jointly typical for some u n (l\m') G C(m'). 
We now provide the details of the coding scheme. 

Codebook generation: Fix p(v, xi)p(u\v, 2/2)2^2 (u, v, y 2 ) that attains the lower bound. Randomly and independently generate 
2 nR sequences v n (m'), m! G [1 : 2 nR ], each according to n™=i Pv(vi)- For each message m! G [1 : 2 nR ], randomly and 
conditionally independently generate 2 nR sequences x™(m"|m') and 2 nR sequences u n (l\m'), each respectively according 
to Y[i=iPx 1 \v( x n\ v i( m ')) an d Y[i=iPu\v( u i\ v i( m ')), which form the subcodebook C(m'). This defines the codebook 
C = {(w"(mO,x; l (m'V0 1 «"(% l '): a ;2( u ™( ? l TO ')^"(TO'),y2)) ■ m' £ [1 : 2 nR '],m" G [1 : 2 nR "]J G [1 : 2 nR }}. The 
codebook is revealed to all parties. 

Encoding: To send message m = (m',m"), the encoder transmits x™(m"|m'). 

Relay encoding: Upon receiving 2/3, the relay finds the unique m! such that {v n {jh'\y\ v ) G 7^/ . Then, it finds the unique 
sequence u n (l\rh') G C(m') such that (« n (7|m'), v n (m'), y 2 ) G 7^; . If there is more than one such index, it chooses one 
of them at random. If there is no such index, it chooses an arbitrary index at random from [1 : 2 nR2 } The relay then transmits 

%2i = x 2 (ui(l\fh'),Vi(rh'),y 2 i) at time i G [1 : n]. 

Decoding: Let e > e'. Upon receiving yj, the decoder declares that m = (m',m") G [1 : 2 nR ] is sent if it is the unique 
message such that (x^(m''\m'),u n (l\m') 1 v n (m'),y 3 ) G for some u n (l\rh') G C(m'); otherwise, it declares an error. 

Analysis of the probability of error: We analyze the probability of error of message M averaged over codes. Assume without 
loss of generality that M = (M' , M") = (1, 1). Let M' be the decoded message at the relay and let L denote the index of 
the chosen U n codeword for M' . The decoder makes an error only if one of the following events occur: 

£ = {M 1 £ 1}, 
£ 1 = {(V n (l),Y 2 n )fTj n) }, 
£2 = {(V n (m'), Y. 2 n ) G T}, n) for some m' ^ 1}, 
£3 = {(U n {l\M'),V n (M'),Y£) <£ T e ( , n) for all U n (l\M') G C(M')}, 
£t = {(X?(l\M'),U n (L\M'),V n (M'),Y 3 n ) £ Tj- n) }, 
£ 2 = {(Xl l (m"\l),U n (L\l),V n (l),Y 3 n ) G T e {n) for some m" / 1}, 
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£ 3 = {(X?(m"|l),£/ n (Z|l),y™(l),F 3 n ) S T c {n) for some m" ^l,l^L, and J7' l (Z|l) e C(l)}, 
£ 4 = {(X?(m"\m!),U n (l\m'),V n (rn'),Y 3 n ) g 7; (n) for some m' + l,m", U n (l\m') e C(m')}. 
By the union of events bound, the probability of error is upper bounded as 
P(£) = P(M 7^ 1) 

< P(£ U £3 u £ 1 U £ 2 U £ 3 U £4) 

< P(£) + P(£ 3 n £ c ) + P(£i n £ c n £ 3 C ) + P(£ 2 ) + P(£ 3 ) + P(£ A ) 

< P(^) + P(£ 2 ) + P(£ 3 n £ c ) + P(£ x n £ c n £ 3 C ) + P(£ 2 ) + P(£ 3 ) + P(£ 4 ). 

By the LLN, the first term tends to zero as n — > 00. By the packing lemma, the second term tends to zero as n — > 00 
if R' < I{V;Y 2 ) - S(e'). Therefore, P(£) tends to zero as n 00 if R' < I(V;Y 2 ) - S(e'). Given £ c , i.e. {M = 1}, 
by the covering lemma, the third term tends to zero as n — > 00 if R > I(U;Y2\V) + S(e'). By the conditional typicality 
lemma, the fourth term tends to zero as n — > 00. By the joint typicality lemma, the fifth term tends to zero as n — > 00 if 
R" < I(Xi; U, Yi\V) — (5(e). The last two terms require special attention because of the dependency between the index L 
and the codebook C = {(V n (m'),X?(m"\m'), U n (l\m'), X^(U n (l\m'), l/ n (m'), Y 2 n ))}. With a similar argument as in the 
analysis for P(f 3 ) in the proof via hybrid coding of Theorem |2] we can show the last two terms tend to zero as n — > 00 
if R" + R< I{Xi,Y 3 \V) +I{U; Xi,r 3 |V) -6(e) and R' + R" + R < I(V,X U U;Y 3 ) + I(X 1 ;U\V) - 6(e) respectively. 
Eliminating R' , R", and R and letting n . — > 00 completes the proof. ■ 

Remark 4. Setting V = (V,X 2 ) and U = 0, the GP-PDF-CF lower bound reduces to the PDF lower bound ©. Note that 
the choice of X 2 gives the Markov chain X 2 — > V — > Y2. Furthermore, setting V = and U = Yi, the GP-PDF-CF lower 
bound reduces to the GP-CF lower bound ©. However, this lower bound does not recover the GP-DF lower bound (0 in 
Theorem Q] 

IV. An Improved Upper Bound 

In this section, we provide an improved upper bound on the capacity, which is tight for the class of degraded noncausal 
relay channels. We show through an example that the new upper bound can be strictly tighter than the cutset bound. 

Theorem 4. The capacity of the noncausal relay channel is upper bounded as 

Coo < Rnvb - maxmin{/(X i; Y 2 ) + I(X 1 ;Y 3 \X 2 , Y 2 ), I(X 1 , U; Y 3 ) - I(Y 2 ; U\Xj)}, 
where the maximum is over all ptnfs p(xi)p(u\x\,y 2 ) and functions x 2 (u,Xi,y 2 ). 

Proof: The first term in the upper bound follows from the cutset bound ©. To establish the second bound, identify 

Ui = (M, Y£ i+1 , y 3 i_1 ). Let Q - Unif [1 : n] be independent of (U n , X'[ l , Y 2 n , Y 3 n ) and set U = (Uq, Q), X 1 = X 1Q , Y 2 = 
Y 2 q, Y3 = Y 3Q . We have 

nR = H(M) 

<I(M;Y 3 n ) + ne n 

n 

= Y / I(M;Y 3l \Y;~ 1 )+ne n 

»=l 
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<^7(M 1 F 3 < " 1 ;^3i)+«e« (12) 
i=i 

n 

= £[7(M, F 2 ». +1 , rr 1 ; r 3l ) - I(Y^ 1+1 ;Y 3i \Y;-\M)} + ne n 

i=l 
n 

( = } $}j(m, r 2 » +1 , rr 1 ; r 3l ) - j(y 2i; rr 1 |r« +11 m)] + ne„ 

i=l 
n 

( = } ^[/(Xh, m, y 2 ™ +1 , y;~ \ y 3i ) - j(y 2i; kt 1 |K£ +lJ m, + ne„ 

n 

( => M, y 2 ™ +1 , Kg*" 1 ; F 3j ) - /(y M ;M, r 2 " i+1I Fa 1 " 1 \X U )] + ne n 

1=1 

n 

= Y^[I(X U , Ui;Y 3i ) - I{Y 2l ; U t \Xu)] + ne n 
i=l 

= n[I(X 1Q ,U Q ;Y 3Q \Q)-I(Y 2Q ;U Q \X 1Q ,Q)}+ne n 
= n[I(X 1Q , Uq, Q; Y 3Q ) - I(Y 2Q ; U Q , Q\X 1Q )] + ne n 
= n[I{Xi, U; Y 3 ) - I(Y 2 ;U\X 1 )] + ne„, 

where (a) follows by Fano's inequality, (b) follows by Csiszar sum identity, (c) follows since Xu is a function of M, 
and (d) follows since the channel p{y2\x\) is memoryless and thus (Y 2 n i+1 , M) —> Xu —> Y 2 i form a Markov chain. 
Finally, we show that it suffices to maximize over p(x\)p(u\x\,y2) and functions X2(u, x\, y 2 ). Consider a general pmf 
p(x\)p{x2, u\x\, j/ 2 ), by the functional representation lemma (TJ Appendix B], there exists a random variable V independent 
of (U, X 1 ,Y 2 ) such that X 2 is a function of (U, X X ,Y 2 , V). Now define U = (U, V). Then 

C^K max min{7(^ i; F 2 )+ I{Xi\Y s \Xi,Y 2 ) 7(X l5 [/; Y 3 ) - 7(F 2 ; £/| Xi)} 

p{x 1 )p(x 2 ,u\x 1 ,y 2 ) 

max min{7(X 1 ;y 2 )+7(X 1 ;y 3 |X 2 ,y 2 ), I(X U U,V;Y 3 ) - I{Y 2 ;U,V\X 1 )} 

p(x 1 )p(u\x 1 ,y 2 ) 
p(v)x 2 (x 1 ,y 2 ,u,v) 

< max min{7(X 1 ;y 2 )+7(X 1 ;y 3 |X2,y2), 7(X 1 ,C/;y 3 )-7(y 2 ;C/|X 1 )}. 

p{xi)p{u\x 1 ,y 2 ) 
x 2 (x!,y 2 ,u) 

Thus, there is no loss of generality in restricting X 2 to be a function of (U, Xi,Y 2 ). ■ 

Remark 5. This upper bound is always tighter than the cutset bound. To see this, note that the new upper bound is equivalent 
to expression (TTZt with all the remaining steps being equality. On the other hand, the cutset bound can be derived from (IT2l) 

as 

n 

^<^7(Af,y 3 J - 1 ;y 3j )+ne n 

i=l 
(a) n 

< I(X H ,X2i, M, Yt X ;Y 3i ) + ne n 

i=l 
n 

= ^I{X li ,X 2l -Y 3i ) +ne n 

i=l 

= nI(Xi,X 2 \Y 3 ) + ne ni 

where (a) can be loose in general. 
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Theorem 5. The capacity of the degraded noncausal relay channel p{y2\x\)p{y 3 \x\, x 2 , 2/2) = p{y2\x\)p{ys\x2, 2/2) is 

C 00 =maxmm{I(X 1 ;Y 2 ), I(X U U;Y 3 ) - I(Y 2 ;U\X 1 )}, (13) 
where the maximum is over all pmfs p{x\)p{u\xi, y 2 ) and functions x 2 (it, xi, y 2 )- 

Proof: In the degraded case, we have J(Xi;Y 2 ) + I{X X \ F 3 |X 2 , F 2 ) = F 2 ). Thus, the GP-DF lower bound in 

Theorem Q] and the improved upper bound in Theorem |4] coincide. ■ 
The improved upper bound on the capacity of the noncausal relay channel can be strictly tighter than the cutset bound. 
In the following, we provide an example, motivated by J4] Example 2], where Ruf < ^gp-df = Coo = -Rnub < -Res- 

Example 2. Consider a degraded noncausal relay channel p(y 2 \%i)p(y3\%i, %2, 2/2) — p(y2\ x i)p{y3,\ x 2i 2/2) as depicted in 
Figure [5] The channel from the sender to the relay is BSC(pi), while the channel from the relay to the receiver is BSC(p 2 ) 
if Y 2 = and BSC(p 3 ) if Y 2 = 1. 




Fig. 5. Channel statistics of the degraded noncausal relay channel 

When pi = 0.2, P2 = 0.1, and p 3 = 0.55, we have 

i?DF = 0.2203, 
Res = 0.2566, 

^GP-DF = Coo = ^NUB = 0.2453. 

The DF lower bound and cutset bound expressions (f2) and <j4j contain no auxiliary random variable and thus can be 
computed easily. In the capacity expression ( TTBl in Theorem [5] the maximum is attained by U ~ Bern(l/2) independent 
of (X 1 ,Y 2 ) and X 2 = U © Y 2 , which yields the capacity = 0.2453. 

We prove this via a symmetrization argument motivated by Nair Q. Note that 

Coo= max min{/(X i; F 2 ), I(X U U; Y 3 ) - I(U; Y 2 \X 1 )} 

p(x 1 )p(u\x 1 ,y 2 ) 
x 2 (u,x 1 ,y 2 ) 

= maxmin{l(Xi;Y 2 ), max (/(X^E/jFj) - /(l/;ra|-Xi))). (14) 

X2(«,Xl,J/ 2 ) 

Consider the maximum in the second term for a fixed p{x\). Assume without loss of generality that U = {1, 2, . . . , |W|}. 
For any conditional pmf Pu\x 1 .Y 2 (u\xi,y 2 ) and function ir 2 (u, x%,y 2 ), define {/, X 2 , and F 3 as 

p^(u) = Ptj(-u) = -Pu(u), ueU, 

Px l X2\u( Xl >y 2 \ u ) =p x 1 ,Y 2 \u( Xl > y2 \ - u ) = PxuY a \u(x 1 ,y 2 \u), ueU, 

X2(u,x 1 ,y 2 ) = 1 - x 2 (-u,x 1 ,y 2 ) = x 2 (u,xx,y 2 ), (u, x 2 ) e U x {0, 1}, (15) 
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PY 3 \X 2 ,Y 2 (V3\%2,y2) =PY 3 \X 2 ,Y 2 (y3\x2,V2), 2/3 & {0,1}. 

Then for any u E U, 

Pu\x u Y 2 ( u \ Xl > y ^ = Pu\x u y 2 (- u \ x ^V2) = ^p u \x 1 ,Y 2 {u\x ll y 2 ), (16) 

PY 2 \X u u(y2\xi,u) =PY 2 \X U U(V2\X1,-U) = PY 2 \X u u{V2 \ Xl , u), 

PY 3 \X 1 ,u(y3\ x U u ) = 1 -PY 3 \X 1 ,u( y ^ Xl '~ U ^ =PY 3 \Xi,u(y3\xi,u). (17) 

Thus, H(Y 2 \X U U = u) = H(Y 2 \X U U = u) = H(Y 2 \X 1 ,U = -u) for all u e U, which implies that H(Y 2 \X 1 ,U) = 
H(Y 2 \Xi,U). Similarly, we can show H{Y 3 \Xi, U) = H(Y 3 \Xi, U). It can be also easily shown that for any y 2 <E {0, 1}, 
Px 2 \y 2 (0 1 2/2 ) = 1/2, which implies that p-f- 3 (0) = 1/2 and H(Y 3 ) = 1. Hence, 

I(X U U; Y 3 ) - I(U\ Y 2 \Xx) = H(Y 3 ) - H(Y 3 \X U U) - H(Y 2 \X 1 ) + H(Y 2 \X U U) 

< H(Y 3 ) - H{Y 3 \X U U) - H(Y 2 \Xi) + H(Y 2 \X 1 ,U) 

\u\ 

= J2pu(u) {H(Y 2 \X 1 ,U, \U\=u)-H(Y 3 \X 1 ,U, \U\=u))+H(Y 3 )-H(Y 2 \X 1 ) 

u=l 

<msx.(H(Y 2 \X lt U,\U\ = u) - H(Y 3 \Xi, U, \U\ = u)) + 1 - H( Pl ), 

where the last maximum is attained by P(j(u) = pjj(-u) = 1/2 for a single u. Note that from our definition of U, this 
automatically guarantees the independence between U and (Xi,Y 2 ). Therefore, the maximum in the second term of (fT4t is 
attained by U ~ Bern(l/2) independent of (Xi, Y 2 ). Subsequently, we relabel U as U with alphabet {0, 1}, Y 3 as Y 3 , and 
X 2 as X 2 . 

Now we further optimize the second term in (fl4t . which we have simplified as 

I(X U U; Y 3 ) - I(U; Y 2 \X 1 ) ^ I(X U U; Y 3 ) 

^l-H(Y 3 \Xi,U), (18) 

where (a) follows by the optimal choice of U independent of (Xi,Y 2 ) and (b) follows since H{Y 3 ) = 1. We maximize 
([TBI over all functions x 2 (u, xi,y 2 ) satisfying 2:2(0, Xi,y 2 ) = 1 — x 2 (l, x±,y 2 ) for all (x±, y 2 ) £ {0, l} 2 . By the symmetry 
of U as described in (O, H(Y 3 \Xi, U = 0) = H(Y 3 \Xi,U = 1). Thus, 

H(Y 3 1 Xi , U) = Pu (0)H (Y 3 1 Xi , U = 0) + Pu (1)H(Y 3 1 X 1 , U = 1) 
= H(Y 3 \Xi,U = 0) 

= PxA0)H(Y 3 \Xi=0,U = 0)+ P xAl)H(Y 3 \Xi = l,U = 0). 

By considering all four functions x 2 (u = 0,xi = 0,2/2) € {{0,1} — > {0,1}} and removing the redundant choices by the 
symmetry of the binary entropy function, we have 

H(Y 3 \Xi = 0,U = 0)> min{H( Pl p 2 + pip 3 ), H( Pl p 2 + p lP3 )}. 

Similarly, 

H(Y 3 \Xi = 1,U = 0) >mixL{H(pip 2 +p 1 p 3 ),H(pip 2 +pip 3 )}. 
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When pi = 0.2, p2 = 0.1, and p3 = 0.55, the minimum is attained by Xi = U © Yi for both terms regardless of p(x\). 
Therefore the second term in (Tl4t simplifies to 

1 - p Xl (0)H (pxpi +P1P3) - px l {l)H(pxp2 + P\Pz)- 

Finally, maximizing 

min{/(Xi;r 2 ), 1 - PXt (0)H (pip 2 +P1P3) -pXi(l)-ff(piP2 +P1P3)} 
overp(xi), we obtain the capacity Coo = 0.2453. 
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